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Abstract 

In compressive sensing, a small collection of linear projections of a sparse signal contains enough information to 
permit signal recovery. Distributed compressive sensing (DCS) extends this framework, allowing a correlated ensemble 
of sparse signals to be jointly recovered from a collection of separately acquired compressive measurements. In this 
paper, we introduce an ensemble sparsity model for capturing the intra- and inter-signal correlations within a collection 
of sparse signals. For strictly sparse signals obeying an ensemble sparsity model, we characterize the fundamental 
number of noiseless measurements that each sensor must collect to ensure that the signals are jointly recoverable. 
Our analysis is based on a novel bipartite graph representation that links the sparse signal coefficients with the 
measurements obtained for each signal. 
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I. Introduction 

A unique framework for signal sensing and compression has recently developed under the rubric of compressive 
sensing (CS). CS builds on the work of Candes, Romberg, and Tao [1] and Donoho [2], who showed that if a 
signal a; e can be expressed as a sparse superposition of just K < N elements from some dictionary, then 
it can be recovered from a small number of hnear measurements y = ^x, where $ is a measurement matrix of 
size M X N, and M < N. One intriguing aspect of CS is that randomly chosen measurement matrices can be 
remarkably effective for nonadaptively capturing the information in sparse signals. In fact, if a; is a fixed ii'-sparse 
signal and just M = K + 1 random measurements are collected via a matrix $ with independent and identically 
distributed (i.i.d.) Gaussian entries, then with probabiUty one x is the unique i^-sparse solution to y = ^x [3]. 
While there are no tractable recovery algorithms that guarantee recovery when so few measurements are collected, 
there do exist a variety of practical and provably effective algorithms [1, 2, 4, 5] that work when M = 0{K log N). 

The current CS theory has been designed mainly to facilitate the sensing and recovery of a single signal x G K^. 
It is natural to ask whether CS could help alleviate the burdens of acquiring and processing high-dimensional data 
in applications involving multiple sensors. Some work to date has answered this question in the affirmative. For 
example, if the entries of an unknown vector x £ are spread among a field of sensors (e.g., if x represents a 
concatenation of the ambient temperatures recorded by N sensors at a single instant), then certain protocols have 
been proposed for efficiently computing y = ^x through proper coordination of the sensors [6-9]. Given y, standard 
CS recovery schemes can then be used to recover x using a model for its sparse structure. 

It is interesting, however, to consider cases where each sensor observes not a single scalar value but rather a 
longer vector. For example, consider an ensemble of signals xi,X2, ■ ■ ■ ,xj G observed by a collection of J 
sensors, where each sensor j e {1,2, J} observes only signal xj (e.g., xj might represent a time series of 
N temperature recordings at sensor position j). In such a scenario, one could employ CS on a sensor-by-sensor 
basis, recording random measurements yj = ^jxj of each signal, and then reconstructing each signal xj from the 
measurements yj. Such an approach would exploit mfra-signal correlations (manifested in a sparse model for each 
Xj), but would not exploit any m?er- signal correlations that may exist among the signals Xj. 

Motivated by this observation, we have proposed a framework known as distributed compressive sensing (DCS) 
that allows the exploitation of both intra- and inter-signal correlation structures.^ In a typical DCS scenario, each 
sensor separately collects measurements yj = ^jxj as described above, but these measurements are then transmitted 
to a single collection point (a single "decoder") where the ensemble of signals is reconstructed jointly using a 
model that characterizes correlations among the sparse signals. By exploiting the inter-signal correlations, DCS 
allows the overall measurement burden to be shared among the J sensors; in other words, the signal ensemble 
can be reconstructed jointly from significantly fewer measurements than would be required if each signal were 
reconstructed individually. Although we do not go into the details here, one can make interesting connections 
between DCS and the Slepian-Wolf framework for distributed source coding, in which correlated random sources 

'Our prior work in DCS is contained in two technical reports [3, 10] and several conference publications [11-15]. 
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can each be encoded below their nominal entropy rate if they are decoded jointly [3, 11, 16, 17]. 

As mentioned above, any DCS decoder must rely on a correlation model that describes the anticipated structure 
within and among the signals in the ensemble. There are many conceivable ways in which correlations can be 
described among a collection of sparse signals. We have previously proposed [3, 11-14] several models for capturing 
such correlations and studied each model in isolation, developing a variety of practical reconstruction algorithms 
and theoretical arguments customized to the nuances of each model. The goal of this paper is to develop a broader, 
general purpose framework for quantifying the sparsity of an ensemble of correlated signals. In Section II, we 
introduce a factored representation of the signal ensemble that decouples its location information from its value 
information: a single vector encodes the values of all nonzero signal entries, while a binary matrix maps these values 
to the appropriate locations in the ensemble. We term the resulting models ensemble sparsity models (ESMs). ESMs 
are natural and flexible extensions of single-signal sparse models; in fact, the ensemble correlation model proposed 
in [18] and all of our previously proposed models fit into the ESM framework as special cases. 

The bulk of this paper (Section in. Section IV, and several supporting appendices) is dedicated to answering a 
fundamental question regarding the use of ESMs for DCS: how many measurements must each sensor collect to 
ensure that a particular signal ensemble is recoverable? Not surprisingly, this question is much more difficult 
to answer in the multi-signal case than in the single-signal case. For this reason, we focus not on tractable 
recovery algorithms or robustness issues but rather on the foundational limits governing how the measurements 
can be amortized across the sensors while preserving the information required to uniquely identify the sparse signal 
ensemble. To study these issues, we introduce a bipartite graph representation generated from the ESM that reflects, 
for each measurement, the sparse signal entries on which it depends. Our bounds relate intimately to the structure of 
this graph. While our previous work in DCS has helped inspire a number of algorithms for recovery of real-world 
signal ensembles [19-21], we believe that the results in this paper and the analytical framework that we introduce 
will help estabhsh a solid foundation for the future development of DCS theory. 

II. Ensemble Sparsity Signal Models 

In this section, we propose a general framework to quantify the sparsity of an ensemble of correlated signals. 
Our approach is based on a factored representation of the signal ensemble that decouples its location information 
from its value information. Later, in Section III, we explain how the framework can be used in the joint recovery of 
sparse signals from compressive measurements, and we describe how such measurements can be allocated among 
the sensors. 

A. Notation and Definitions 

We use the following notation for signal ensembles. Let A := {1, 2, . . . , J} index the J signals in the ensemble. 
For a subset F C A, we define F'-^ := A \ F. Denote the signals in the ensemble by Xj, with j G A. We assume 
that each signal Xj G M^, and we let 

x = [x1xl ■■■x'^jY e^'"' 
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denote the concatenation of the signals. For a given vector v, we use v{n) to denote the n*^ entry of v, and we 
use the £o "norm" \\v\\o to denote the number of nonzero entries in v. Conventionally, \\v\\o is referred to as the 
sparsity of the vector we elaborate on this point below and discuss natural extensions of the concept of sparsity 
to multi-signal ensembles. Finally, we will also refer to the following definition. 

Definition 1: For any nonnegative integers L\ < L2, an L2 x Li identity submatrix is a matrix constructed by 
selecting L\ columns from the L2 x L2 identity matrix Ii,2xL2; selected columns need not be adjacent, but 
their order is preserved from left to right. 

B. Sparse Modeling for a Single Signal 

To motivate the use of a factored representation for modeling sparsity, we begin by considering the structure of 
a single sparse signal x e that has K < N nonzero entries. We note that the degrees of freedom in such a 
signal are captured in the K locations where the nonzero coefficients occur and in the K nonzero values at these 
locations. It is possible to decouple the location information from the value information by writing x = P6, where 
6 e contains only the nonzero entries of x, and where P is m N x K identity submatrix that includes x in its 
column span. Any /^-sparse signal can be written in this manner. 

In fight of the above, to model the set of all possible sparse signals, define V to be the set of all identity 
submatrices of all possible sizes N x K', with 1 < K' < N. We refer to P as a sparsity model, because the 
concept of sparsity can in fact be defined within the context of this model. To be specific, given an arbitrary 
signal X G R^, one can consider all possible factorizations x = P9 with P gV. Among these factorizations, the 
dimensionafity of the unique smallest representation 9 equals the sparsity level of the signal x; in other words, we 
will have dim{6) = \\x\\o. 

C. Sparse Modeling for a Signal Ensemble 

We generafize the formulation of Section II-B to the signal ensemble case by considering factorizations of the 
form X = PQ, where X e R"^^ represents the entire signal ensemble as defined above, P is a matrix of size 
JN X Q for some integer Q, and 6 e MP. In any such factorization, we refer to P and © as the location matrix 
and value vector, respectively. 

Definition 2: An ensemble sparsity model (ESM) is a set V of admissible location matrices P; the number of 
columns among the P gV may vary, but each has JN rows. 

As we discuss further below, there are a number of natural choices for what should constitute a valid location 
matrix P, and consequently, there are a number of possible ESMs that could be used to describe the correlations 
among sparse signals in an ensemble. 
For a fixed ESM, not every matrix P gV can be used to generate a given signal ensemble X. 

^We consider for the sake of illustration — ^but without loss of generality — signals that are sparse in the canonical basis. All of our analysis 
can be easily extended to signals that are sparse in any orthonormal basis. 
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Definition 3: For a given ensemble X and ESM V, the set of feasible location matrices is 

Vf{X) := {P e P s.t. X e colspan(F)}, 
where colspan(P) denotes the column span of P. 

Note that Vf{X) C V. 

Definition 4: In the context of an ESM V, the ensemble sparsity level of a signal ensemble X is 

D = D(X,V) := min diin(colspan(P)). 

When P is full-rank, the dimension of its column span is equal to its number of colunms; we will expand on this 
property in Section II-E. For many ESMs, we may expect to have D < J2jeA ll^illo- 

D. Common/Innovation Location Matrices 

There are a number of natural choices for the location matrices P that could be considered for sparse modehng of a 
signal ensemble. In this paper (as we studied earlier in [3]), we are interested in the types of multi-signal correlations 
that arise when a number of sensors observe a common phenomenon (which may have a sparse description) and 
each of those same sensors observes a local innovation (each of which may also have a sparse description). To 
support the analysis of such scenarios, we restrict our attention in this paper to location matrices P of the form 

Pc Pi ... 
Pc P2 ... 



P = 



(1) 



Pc ... Pj 

where Pc and each Pj, j e A, are identity submatrices with N rows, and where each denotes a matrix of 
appropriate size with aU entries equal to 0. For a given matrix P of this form, let Kc{P) denote the number of 
columns of the element Pc contained in P, and for each j e A, let Kj{P) denote the number of columns of Pj. 

Let us explain why such location matrices are conducive to the analysis of signals sharing the common/innovation 
structure mentioned above. When a signal ensemble X e R^^^ is expressed as X = P6 for some P of the form 
(1), we may partition © into the corresponding components 

e = [0^ 0j el ... 

where 61c e M^^^^' and each Qj e K^^(^). Then, letting 

zc := Pc^C and Zj := Pj6j for each j G A, (2) 
we can write each signal in the ensemble as 

Xj = Zc + Zj, 

where the common component zc has sparsity Kc{P) and is present in each signal and the innovation components 
z\,Z2,... Zj have sparsities Ki{P), K2{P), . ■ . , Kj{P), respectively, and are unique to the individual signals. 
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Example 1: Consider J = 2 signals of dimension N = 4 each, specifically a;i = [3 1 0]^ and a;2 = [1 1 0]^. 
Different choices of P can account for the common structure in xi and X2 in different ways. For example, we 
could take ^ ^ ^ ^ 



1 







1 





1 











, Pl^ 



























in which case we can write X = PQ by taking 6 = [1 1 2]^. Under this choice of P, we have = [1 1 0]^, 
2:1 = [2 0]^ and 02 = [0 0]^, and the sparsity levels for the respective components are Kc{P) = 2, 
Ki{P) = 1, and K2{P) = 0. Alternatively, we could take 

Pc = Pc, Pi = P2, and P2 = Pi, (4) 

in which case we can write X = PQ by taking = [3 1 — 2]^. Under this choice of P, we have zc = [3 1 0]"^, 
2i = [0 0]-^ and Z2 — [—2 0]^, and the sparsity levels for the respective components are Kc{P) = 2, 
i^i(P) =0, and K2{P) = 1. 



E. Common/Innovation ESMs 

In this paper, we restrict our attention to ESMs that are populated only with a selection of the common/innovation 
location matrices described in Section II-D. 
Definition 5: An ESM P is called a common/innovation ESM if every P £P has the form (1) and is full-rank. 

The requirement that each P gV have fuU rank forbids any P for which Pc and all {Pj}jeA have one or more 
columns in common; it is natural to omit such matrices, since a full-rank matrix of the form (1) could always be 
constructed with equivalent column span by removing each shared column from Pc or any one of the Pj. 

Depending on the type of structure one wishes to characterize within an ensemble, a connmon/innovation ESM 
V could be populated in various ways. For example: 

• One could allow V to contain all fuU-rank matrices P of the form (1). This invokes a sparse model for both 
the common and innovation components. 

• Or, one could consider only fuU-rank matrices P of the form (1) where Pc = T-nxn- This removes the 
assumption that the conmion component is sparse. 

• Alternatively, one could consider only full-rank matrices P of the form (1) where Pc = [ ] and where 
Pi = P2 = ■ ■ ■ = Pj. This model assumes that no conmion component is present, but that the innovation 
components all share the same sparse support. 

• Finally, one could consider only fuU-rank matrices P of the form (1) where Pc = [ ] and where all of the 
matrices Pj share some minimum number of colunms in common. This model assumes that all innovations 
components share some support indices in common. 
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We have previously studied each of the first three cases above [3, 11-14], proposing a variety of practical recon- 
struction algorithms and theoretical arguments customized to the nuances of each model. Later, the fourth case 
above was proposed and studied in [18]. In this paper, however, we present a unified formulation, treating each 
model as a special case of the more general conmion/innovation ESM framework. Consequently, the theoretical 
foundation that we develop starting in Section III is agnostic to the choice of which matrices P of the form (1) are 
chosen to populate a given ESM V under consideration, and therefore our results apply to all of the cases in [3, 
11-14, 18]. 

III. Distributed Measurement Bounds 

In this section, we present our main results concerning the measurement and reconstruction of signal ensembles 
in the context of common/innovation ESMs. 



A. Distributed Measurements 

We focus on the situation where distributed measurements of the signals in an ensemble X e K"^^ are collected. 
More precisely, for each j e A, let $j denote a measurement matrix of size Mj x N, and let yj = ^jXj represent 
the measurements collected of component signal Xj. When appropriate below, we make exphcit an assumption that 
the matrices $j are drawn randomly with i.i.d. Gaussian entries, though other random distributions could also be 
considered. 

Suppose that the collection of measurements 

Mi 



y = [ylyl •••2/J]^eK^. 

is transmitted to some central node for reconstruction. Defining 

$1 ... 



J 

3 = 1 ' 



$2 







U-,Mj)xJN 



... 

we may write Y = ^X. In the context of a common/innovation ESM V, we are interested in characterizing 
the requisite numbers of measurements Mi, M2, . . . , Mj that will permit the decoder to perfectly reconstruct the 
ensemble X from Y. 



B. Reconstruction of a Value Vector 

Let us begin by considering the case where the decoder has knowledge of some fuU-rank location matrix P e 
Vf{X). In this case, perfect reconstruction of the ensemble X is possible if the decoder can identify the unique 
value vector 9 such that X = PQ. 
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To understand when perfect reconstruction may be possible, note that for any © such that X = PQ, we can 
write 

... 

$2-Pc $2^2 ••• 



^JPJ 



Oc 

02 



(5) 



To ensure that Q can be uniquely recovered from Y, certain conditions must be met. For example, it is clear that 

the total number of measurements cannot be smaller than the total number of unknowns, i.e., that we must have 

J J 
J2 Mj > dim{e) = Kc{P) + Y.K,{P). (6) 

However, only certain distributions of these measurements among the sensors will actually permit recovery. For 
example, the component 6j e is measured only by sensor j, and so we require that 



Mj > Kj{P) 



(7) 



for each j £ A. Taken together, conditions (6) and (7) state that each sensor must collect enough measurements to 
allow for recovery of the local innovation component, while the sensors collectively must acquire at least Kc{P) 
extra measurements to permit recovery of the common component. While these conditions are indeed necessary 
for permitting recovery of G from Y (see Theorem 2), they are not sufficient — there are additional restrictions 
governing how these extra measurements may be allocated to permit recovery of the common component. 

To appreciate the reason for these additional restrictions, consider the case where for some indices n G {1, 2, . . . , N} 
and j G A, row n of Pc contains a 1 and row n of Pj contains a 1. Recalling the definitions of zq and Zj from 
(2), this implies that both zc{n) and Zj{n) have a corresponding entry in the unknown value vector 8. In such 
an event, however, it is impossible to recover the values of both zc{n) and Zj{n) from measurements of Xj alone 
because these pieces of information are added into the single element Xj{n) = zc{n) + Zj{n). Intuitively, since 
the correct value for Zj{n) can only be inferred from r/j, it seems that the value zc(n) can only be inferred using 
measurements of other signals that do not feature the same overlap, i.e., from those yji such that row n of Pji 
contains all zeros. 

Based on the considerations above, we propose the following definition. 

Definition 6: For a given location matrix P belonging to a common/innovation ESM V and a given set of signals 
r C A, the overlap size /l'c'(r,P) is the number of indices in which there is overlap between the common and 
innovation component supports at all signals j e F'-^: 



Kc{r, P) := |{n £ {1, . . . , N} : row n of Pc is nonzero and Vj £ F*^, row n of Pj is nonzero}| 
We note that Kc{A, P) = Kc{P) and Kc{$, P) = 0. 



(8) 
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Relating to our discussion above, for each entry n G {1, . . . , A''} counted in Kc{r, P), we expect that some sensor 
in r must take one extra measurement to account for that entry of the common component — ^it is impossible to 
recover such entries from measurements made only by sensors outside F. Our first main result confirms that ensuring 
the sensors in every F C A collectively acquire at least Kc{T, P) extra measurements is indeed sufficient to permit 
recovery of © from Y. 

Theorem 1: (Achievable, known P) Let X denote a signal ensemble, and let P G Vf{X) be a full-rank location 
matrix in a common/innovation ESM V. For each j e A, let $j be a random Mj x N matrix populated with i.i.d. 
Gaussian entries. If 

Y,Mi > [Y,K^{P)\+Kc{T,P) (9) 
jer \jer / 

for all subsets F C A, then with probability one over {$j}jgA. there exists a unique solution 6 to the system of 
equations Y = $P0, and hence, letting X := PQ we have X = X. 

Our proof of Theorem 1 is presented in Section IV. The proof is based on a bipartite graph formulation that 
represents the dependencies between the obtained measurements Y and the coefficients in the value vector O. 
Intuitively, the bipartite graph arises from an interpretation of the matrix T = $P as a biadjacency matrix [22]. 
The graph is fimdamental both in the derivation of the number of measurements needed for each sensor and in the 
formulation of a combinatorial recovery procedure for the case where P is unknown; we revisit that problem in 
Section ni-C below. 

Although Theorem 1 can be invoked with any feasible location matrix, it yields the most favorable bounds when 
invoked using a location matrix that contains just D colunms. One impUcation of this theorem is that, when a 
location matrix P e Vf{X) is known, reconstruction of a signal ensemble X can be achieved using fewer than 
||a;j||o measurements at some or all of the sensors j. This higUights the benefit of joint reconstruction in DCS. 

Our second main result estabUshes that the the measurement boimd presented in Theorem 1 cannot be improved. 
We defer the proof of the following theorem to Appendix A. 

Theorem 2: (Converse) Let X denote a signal ensemble, and let P G Vf{X) be a full-rank location matrix in 
a common/innovation ESM V. For each j € A, let $j be an Mj x N matrix (not necessarily random). If 

^ M, < K, (P) + Kc{r, P) (10) 

for some nonempty subset F C A, then there exists a value vector 6 such that Y = $P© but X := P6 X. 

Example 2: Consider again the signal ensemble presented in Example 1. For the matrix P specified in (3), the 
overlap sizes are Kc{{l},P) = (since there is no overlap between conmion and innovation components in 
sensor 2), ii'c({2},P) = 1 (since there is overlap in the common and innovation components at sensor 1 for 
index 1), and Kc{{l-, 2}, P) = Kc{P) = 2. Alternatively, for the matrix P specified in (4), the overlap sizes are 
Kc{{l},P) = 1, Kc{{2},P) = 0, and Kc{{l,2},P) = Kc{P) = 2. Thus, for a decoder with knowledge of 
either one of these location matrices. Theorem 1 tells us that X can be uniquely recovered if M\ > 1, M2 > 1, and 
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Ml + M2 > 3. Conversely, Theorem 2 tells us that X cannot be uniquely recovered using either of these location 
matrices if Mi = 0, if M2 = 0, or if Mi = M2 = 1. 

C. Identification of a Feasible Location Matrix 

In general, when presented with only the measurements Y, it may be necessary for a decoder to find both a 
feasible location matrix P e Vf{X) and a value vector © such that X = PQ. Just as identifying the sparse 
coefficient locations in single- signal CS can require more measurements than solving for the values if the locations 
are known [3], the multi-signal problem of jointly recovering P and 9 could require more measurements than 
specified in Theorem 1 for the case where P is known. Our final main result, however, guarantees that a moderate 
increase in the number of measurements beyond the bound specified in (9) is sufficient. The following is proved 
in Appendix B. 

Theorem 3: (Achievable, unknown P) Let X denote a signal ensemble, and let V denote a common/innovation 
ESM. For each j e A, let $j be a random Mj x N matrix populated with i.i.d. Gaussian entries. If there exists a 
full-rank location matrix P* e Vf{X) such that 



for aU subsets F C A, then X can be recovered from Y. 

The achievable measurement bound in (11) can be met by taking just one additional measurement per sensor 
above the rate specified in (9); comparing with the converse bound in (10), we see that this is virtually as lighl as 
possible. Like Theorem 1, Theorem 3 yields the most favorable bounds when invoked using a location matrix that 
contains just D columns. 

The proof of Theorem 3 involves an algorithm based on an enumerative search over all P e this is akin to the 
£0 minimization problem in single-signal CS. Indeed, removing the conmion component and taking J = 1, our bound 
reduces to the classical single-signal CS result that K + 1 Gaussian random measurements suffice with probability 
one to enable recovery of a fixed -ft'-sparse signal via £0 minimization [3, 23]. Although such an algorithm may not 
be practically implementable or robust to measurement noise, we beheve that our Theorem 3 (taken together with 
Theorems 1 and 2) provides a theoretical foundation for understanding the core issues surrounding the measurement 
and reconstruction of signal ensembles in the context of ESMs. 

Example 3: We once again revisit the signal ensemble presented in Example 1. Using either of the feasible 
matrices P specified in (3) or (4) for the purpose of evaluating the bound (11), Theorem 3 tells us that an a priori 
unknown feasible location matrix and corresponding value vector can be found to allow perfect recovery of the 
signal ensemble X, as long as Mi > 2, M2 > 2, and Mi -|- M2 > 5. For example, xi and X2 can be recovered 
when Ml = 3 and M2 = 2. For this choice of Mi and M2 and for the location matrix P specified in (3), Figure 1 
shows that there exists a matching that associates each element of the value vector 9 to a unique measurement. 
Our exposition of the graph based formulation (see Section IV) explains how the existence of such a matching 




(11) 
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Value vector 
coefRcients 



2c(l) 



^i(l) 




2/2(2) 



2/2(1) 



2/1(3) 



2/i(l) 



2/1 (2) 



Measurements 



Fig. 1. Graphical representation of the dependencies between value vector coefficients and compressive measurements for the 
signal ensemble X and location matrix P discussed in Example 3. Each edge in this graph denotes a dependency of a measurement 
on a vaJue vector coefficient, but dashed lines indicate dependencies that cannot be exploited due to overlap of common and 
innovation coefficients. Among the edges that remain, the thick solid lines indicate the existence of a matching from each value 
vector coefficient to a distinct measurement; the existence of such a matching ensures that the system of equations Y — $PO is 
invertible (see Theorem 1 and its proof in Section IV). Measurements that remain unassigned in this matching can then be used to 
verify the correctness of the solution (see Theorem 3 and its proof in Appendix B). 

ensures perfect recovery of 0, given P, and our proof of Theorem 3 (see Appendix B) explains how the remaining 
measurements can be used to identify a feasible location matrix. 

IV. Central Proof and Bipartite Graph Formulation 

This section is dedicated to proving Theorem 1. In order to prove this theorem, we introduce a bipartite graph 
formulation that represents the dependencies between the obtained measurements Y and the coefficients in the value 
vector 0. 

A. Proof of Theorem 1 

For brevity, we denote Kq {P) and Kj [P) simply as Kc and Kj . We denote the number of columns of P by 



and note that D' > D. Because P e VpiX), there exists 9 e R^' such that X = P9. Because Y = 6 is a 
solution to y = $P6. 

We will argue that, with probability one over $, T ^P has rank D', and thus 9 is the unique solution to 
the equation Y = ^PQ = T9. To prove that T has rank D', we invoke the following lemma, which we prove in 
Section IV-B. 




(12) 
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Lemma 1: If (9) holds, then there exists a mapping C : {1,2, ... ,Kc} A, assigning each element of the 
common component to one of the sensors, such that for all nonempty subsets F C A, 

> (13) 

where Cj := |{fc e {1, 2, . . . , Kc} : C{k) = j}\ for each j e A, and such that for each fc e {1, 2, . . . , Kc}, the 
k^^ colunm of Pc is not a column of Pc(fe)- 

Intuitively, the existence of such a mapping suggests that (i) each sensor has taken enough measurements to cover 
its own innovation component (requiring Kj measurements) and perhaps some of the common component, (ii) for 
any F C A, the sensors in F have collectively taken enough extra measurements to cover the requisite Kc(r,P) 
elements of the conmion component, and (iii) the extra measurements are taken at sensors where the conamon and 
innovation components do not overlap. Formally, we will use the existence of such a mapping to prove that T has 
rank D'. 

We proceed by noting that T has the block structure illustrated in (5), where each ^jPc (respectively, ^jPj) is 
an Mj X Kc (respectively, Mj x Kj) submatrix of $j obtained by selecting columns from $j according to the 
colunms contained in Pc (respectively, Pj). Referring to (12), we see that, in total, T has D' colunms. To argue 
that T has rank D', we will consider a sequence of three matrices Tq, Ti, and T2 constructed from modifications 
to T. 

Construction of Tq: We begin by letting Tq denote the "partially zeroed" matrix obtained from T using the 
following construction: 

1) Let To = T and fc = 1. 

2) For each j such that Pj has a colunm that matches colunm k of Pc (note that by Lemma 1 this cannot 
happen if C{k) = j), let k' represent the colunm index of the full matrix P where this colimm of Pj occurs. 
Subtract colunm k' of Tq from colunm A; of Tq. This forces to zero all entries of Tq formerly corresponding 
to colunm k of the block ^jPc- 

3) If fc < Kc, then increment k and go to step 2. 

The matrix Tq is identical to T everywhere except on the first Kc colunms, where any portion of a colunm equal 
to a column of ^jPj to its right has been set to zero.^ Thus, Tq satisfies the next two properties, which will be 
inherited by matrices Ti and T2 that we subsequently define: 

P1 . Each entry of Tq is either zero or a Gaussian random variable. 

P2. All Gaussian random variables in Tq are i.i.d. 

Finally, because Tq was constructed only by subtracting columns of T from one another, rank(To) = rank(T). 

Construction of Ti: We now let Ti be the matrix obtained from Tq using the following construction: For each 
j e A, we select Kj + Cj arbitrary rows from the portion of Tq corresponding to sensor j (the first Mi rows of 

'We later show that with probability one, none of the columns become entirely zero. 
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To correspond to sensor 1, the following M2 rows correspond to sensor 2, and so on). The resulting matrix Ti has 



rows; note that this is fewer than the number of rows in Tq if Kj + Cj < Mj for any j. Also, because Ti 
was obtained by selecting a subset of rows from Tq, it has D' columns (just like Tq) and satisfies rank(Ti) < 
rank(To) = rank(T). 

Construction of T2: We now let T2 be the D' x D' matrix obtained by permuting colimms of Ti using the 
following construction: 

1) Let T2 = [ ], and let j = 1. 

2) For each k such that C{k) = j, let Ti(fc) denote the k^^ column of Ti, and concatenate Ti(A;) to T2, i.e., 
let T2 ^ [T2 Ti(fc)]. There are Cj such columns. 

3) Let Ti J denote the columns of Ti corresponding to the entries of ^jPj (the innovation components of 
sensor j), and concatenate Ti^ to T2, i.e., let T2 ^ [T2 Tij]. There are Kj such columns. 

4) If j < J, then increment j and go to Step 2. 

In total, Step 2 chooses X]j=i ~ columns, while Step 3 chooses X]/=i columns, and thus referring to 
(12), T2 has Kc + Y^j=i Kj = D' columns. The number of rows is the same as that of Ti, making T2 a square 
matrix. Because Ti and T2 share the same columns up to reordering, it follows that 



Based on its dependency on Tq, and following from Lemma 1, T2 meets properties P1 and P2 defined above in 
addition to a third property: 

P3. All entries along the main diagonal of T2 are Gaussian random variables (none are deterministically zero). 
Property P3 follows because each diagonal element of T2 will either be an entry of some ^jPj, which remains 
Gaussian throughout our constructions, or it will be an entry of some k^^ column of some ^jPc for which C(k) = j. 
In the latter case, we know by Lemma 1 and the construction of Tq (Step 2) that the A:*'^ column of ^jPc is not 
zeroed out, and thus the corresponding diagonal entry remains Gaussian throughout our constructions. 

Having identified these three properties satisfied by T2, we will prove by induction that, with probability one 
over <I>, such a matrix has full rank. 

Lemma 2: Let T^''"^) be a (d — 1) x (cZ — 1) matrix having full rank. Construct a dx d matrix T'^'') as follows: 



where vi,V2 G M are column vectors with each entry being either zero or a Gaussian random variable, uj is 
a Gaussian random variable, and all random variables are i.i.d. and independent of T^''"^). Then with probability 
one, T^"^) has fuU rank. 




rank(T2) = rank(Ti) < rank(T). 



(14) 



fid) .- 



f(d-l) 
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Applying Lemma 2 inductively D' times, the success probability remains one. It follows that with probability 
one over rank(T2) = D'. Combining this last result with (14), we conclude that rank(T) = D' with probability 
one over It remains to prove Lemma 2. 

Proof of Lemma 2: When d = 1, T^'^^ = [cj], which has full rank if and only if w ^ 0, which occurs with 
probability one. 

When d> 1, using expansion by minors, the determinant of T^*^^ satisfies 

det(T('^)) =wdet(T(''-i)) + C, 

where C = C{T^'^~^\vi,V2) is independent of u. The matrix T^'') has full rank if and only if det(T('^)) ^ 0, 
which is satisfied if and only if 

, -c 

det(T(''-i))- 

By the inductive assumption, det(T(''~^') ^ and w is a Gaussian random variable that is independent of C and 
det(T(''~^)). Thus, to ^ det(T(^-i)) ^^'•^ probabihty one. □ 

B. Proof of Lemma 1 

To prove Lemma 1, we apply tools from graph theory. 

We introduce a bipartite graph G = {Vy, Vm,E) that captures the dependencies between the entries of the value 
vector O e K^' and the entries of the measurement vector Y = $P6. This graph is defined as follows. The set 
of value vertices Vy has elements with indices d e {1, . . . , representing the entries 0(d) of the value vector. 
The set of measurement vertices Vm has elements with indices (i, m) representing the measurements yj{m), with 
j G A and m G {1, . . . , Mj} (the range of possible m varies depending on j). The cardinalities for these sets are 
\Vv\ = D' and \ Vm\ = J2jeA ^J- Finally, the set of edges E is defined according to the following rules: 

• For every d G {1,2,..., Kc} ^ Vy and j € A such that column d of Pc does not also appear as a colunm 
of Pj, we have an edge connecting d to each vertex (j, m) € Vm for 1 < m < Mj. 

• For every d G {Kc + 1, Kc + 2, . . . , D'} C Vy, we consider the sensor j associated with column d of P, 
and we have an edge connecting d to each vertex {j, rn) G Vm for 1 < m < Mj. 

An example graph for a distributed sensing setting appears in Figure 2. 

We seek a matching within the bipartite graph G = {Vv,Vm, E), namely, a subgraph {Vv,Vm, E) with E C E 
that pairs each element of VV with a unique element of Vm- Such a matching will immediately give us the desired 
mapping C as follows: for each A; e {1, 2, ... , Kc} Q Vy, let (j, m) G Vm denote the single vertex matched to k 
by an edge in E; we then set C(fc) = j. 

To prove the existence of such a matching within the graph, we invoke a version of Hall's marriage theorem for 
bipartite graphs [24]. Hall's theorem states that within a bipartite graph (yx^Vi^E), there exists a matching that 
assigns each element of V\ to a unique element of Vi if for any collection of elements 11 C V\, the set -E(n) of 
neighbors of 11 in V2 has cardinality |-E(n)| > |n|. To apply Hall's theorem in the context of our lemma, we will 
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Vv Vm 



Fig. 2. The bipartite graph G — {Vv, Vm, E) indicates the relationship between the value vector coefficients and the measurements. 



show that if (9) is satisfied, then for any set 11 C of entries in the value vector, the set £'(11) of neighbors of 
n in Vm has size \E{n)\ > \U\. 

Let us consider an arbitrary set 11 C W- We let S'n = {j G A : (j, to) G E{H) for some to} C A denote the set 
of signal indices whose measurement vertices have edges that connect to 11. Since a connection between a value 
vertex and a measurement vertex at a given sensor implies a connection to all other measurement vertices for that 
sensor, it follows that |£'(n)| = J2jeSn ^'^O' Thus, in order to satisfy Hall's condition for 11, we require 

^'^0 > |n|. (15) 

We would now like to show that J2jes„ + Kc{Sii,P) > |n|, and thus if (9) is satisfied for all T C A, then 
(15) is satisfied in particular for 5n C A. 

In general, the set 11 may contain vertices for both common components and innovation components. We write 
n = He U 11/ to denote the disjoint union of these two sets. 

By construction, |n/| < X^jeSn ^J' b^'^^'^se 11/ cannot include any innovation component outside the set of 
sensors Sn- We will also argue that jllcl < Kc{Sn, P) as follows. By definition, for a set F C A, Kc{T,P) 
counts the number of columns in Pc that also appear in Pj for all j ^ T. By construction, for each k E He, vertex 
k has no connection to vertices {j, to) for j ^ Sn, and so it must follow that the fc"^ column of Pc is present in Pj 
for all j ^ Sn- Thus, the index k is among the indices counted in the definition (8) of Kc{Sn, P), and therefore 
\Tlc\<Kc{Sn,P). 

We conclude that |n| = |n/| + \Uc\ < J2jes„ Kj+Kc{Sn, P), and so (9) implies (15) for any U, and so Hall's 
condition is satisfied, and a matching exists. Finally, consider any set F C A. To confirm that (13) holds for this set, 
note that there are a total of J2jer ^^j vertices {j,m) G Vm such that j G F. Each of these vertices is matched to 
at most one vertex in Vy, which must correspond either to an innovation component counted in Kj for some j G F 
or to a common component indexed by some k such that C{k) G F. It follows that J2jer ^-0 — X]jer(^j + ^j)- 

□ 
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V. Discussion 

In this paper, we have introduced the ensemble sparsity model (ESM) framework for modeling intra- and inter- 
signal correlations within a collection of sparse signals. This framework is based on a factored representation of 
the signal ensemble that decouples its location information from its value information. We have also proposed an 
analytical framework based on bipartite graphs that allowed us, in the context of a conmion/innovation ESM V, to 
characterize the numbers of measurements Mi, M2, . . . , Mj needed for successful recovery of a signal ensemble 
X. Our bounds highlight the benefit of joint reconstruction in distributed compressive sensing (DCS), since sparse 
signals can be recovered from fewer measurements than their nominal sparsity level would indicate. 

The factored representation that we have proposed for modeling sparse signal ensembles is closely related to the 
recently proposed union-of-subspaces modeling frameworks for CS [25-28]. What is particularly novel about our 
treatment is the explicit consideration of the block structure of matrices such as P and and the expUcit accounting 
for measurement bounds on a sensor-by-sensor basis. Most of the conventional union-of-subspaces theory in CS 
is intended to characterize the number of measurements required to recover a vector X from measurements 
where $ is a dense matrix. 

Finally, as we have discussed in Section II-E, common/innovation ESMs can be populated using various choices 
of matrices P of the form (1). Our bounds in Section HI are relatively agnostic to such design choices. Past 
experience, however, has indicated that practical algorithms for signal recovery can benefit from being tuned to 
the particular type of signal correlations under consideration [3, 11-14, 18-21, 29-33]. Our focus in this paper has 
been not on tractable recovery algorithms or robustness issues, but rather on foundational limits governing how the 
measurements may be amortized across the sensors while preserving the information required to uniquely identify 
the sparse signal ensemble. However, we believe that our results and our analytical framework may pave the way 
for a better, perhaps more unified, development of practical DCS algorithms. 

Appendix A 
Proof of Theorem 2 

As in (12), we let D' denote the number of colunms in P. Because P e VpiX), there exists O e such that 
X = PQ. Because Y = then 9 is a solution to F = $Pe. We will argue for T := $P that rank(T) < D', 
and thus there exists 6^6 such that y = TO = TO. Since P has full rank, it follows that X -.= PQ ^ PQ = X. 

We let To be the "partially zeroed" matrix obtained from T using the identical procedure detailed in Section IV-A. 
Again, because Tq was constructed only by subtracting columns of T from one another, it follows that rank(To) = 
rank(T). 

Suppose that F C A is a set for which (10) holds. We let T3 be the submatrix of Tq obtained by selecting the 
following columns: 

• For any k G {1,2,..., Kc} such that colimm k of Pc also appears as a column in Pj for all j ^ F, we 
include column k of Tq as a column in T3. There are Kc{r, P) such columns k. 



17 



• For any k G {Kq + 1, Kc + 2, . . . ,D'} such that column k of P corresponds to an innovation for some sensor 
j e r, we include column k of Tq as a column in T3. There are J2jeT ^^^^ columns k. 

This submatrix has "Y^j^Y-^o + ^c(r,P) colimms. Because Tq has the same size as T (see Section IV- A), and 
in particular has only D' columns, then in order to have that rank(To) = D', it is necessary that all Y^j^y + 
Kc{^,P) columns of T3 be hnearly independent. 

Based on the method described for constructing Tq, it follows that T3 is zero for all measurement rows not corre- 
sponding to the set F. These rows were nonzero only for two sets of columns of Tq: (i) the columns corresponding 
to the innovations for signals j ^ F, and {ii) the columns k G {1,2,..., Kc} for which the fc*'' column of Pc 
appears in none of the matrices Pj,j ^ F. Both of these sets of columns are discarded during the construction of T3. 
Therefore, consider the submatrix T4 of T3 obtained by selecting only the measurement rows corresponding to the 
set F. Because all the rows discarded from T3 are zero, it follows that rank(T3) = rank(T4). However, since T4 has 
only Yjer rows, we invoke (10) and have that rank(T3) = rank(T4) < Yj^r < X^jer + ^c(r, P). 
Thus, all Yjev ^3 + ^c(r, P) columns of T3 cannot be hnearly independent, and so T does not have full rank. 
This means that there exists 9^0 such that Y = TO = TO, and thus we cannot distinguish between the two 
solutions X := PQ PQ = X. □ 

Appendix B 
Proof of Theorem 3 

Given the measurements Y and measurement matrix $, we will show that it is possible to recover some P e 
Vf{X) and a corresponding vector 9 such that X = PQ using the following algorithm. 

• Extract from each measurement vector yj its final entry, and sum these entries to obtain the quantity y = 

—T —T 

Yjeryji^j)- Similarly, add the corresponding rows of $ into a single row (j) . The row vector (p is a 

_ —T 

concatenation of the final rows of the matrices and thus its entries are i.i.d. Gaussian. Note that y = (j) X; 
this quantity will be used in a cross-vahdation step below. 

• Group the remaining {^j^/^ Mj^ — J measurements into a vector Y, and let $ contain the corresponding 
rows of We note that (j) is independent from $ and that Y = ^X. 

• For each matrix P gV such that Y e colspan($P), choose a single solution 9p to y = $P9p independently 
of (j). Then, perform the following cross-vahdation: if y = (j) PQp, then return the estimate X = PQp; 
otherwise, continue with the next matrix P. 

We begin by noting that there exists at least one matrix P G V for which Y G colspan($P) and for which 
X = PQp. In particular, consider the matrix P* G Vf{X) mentioned in the theorem statement. Because (11) 
holds for P*, Theorem 1 guarantees that with probability one, $P* will have full rank, and so there is a unique 
solution 9p. to F = $P*9p.. Since P* G Vf{X) and P* is full rank, we know that X = P*9p.. Also, since 

_ —T 

Y = ^X, we know that y = <f> P*9p», and so this matrix will clear the cross-validation step. 

Now suppose that, for some P G V, the algorithm above considers a candidate solution 9p to F = $P9p, 
but suppose also that X ^ PQp. The algorithm will fail to discard this incorrect solution if 9p passes the cross- 
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validation test, i.e., if <j) PQp = y = (p X. Recall, however, that ^ is an i.i.d. Gaussian random vector and that 
it is independent of both X and PQp. It then follows that <!) is orthogonal to X — PQp with probabihty zero, 

— T —T _ 

and therefore we will have cj) {X — PQp) ^ (equivalently, 4> PQp ^ y) with probabihty one. Therefore, this 
incorrect solution wiU be discarded with probabihty one. Since V contains only a finite number of matrices, the 
probability of cross-vahdation discarding aU incorrect solutions remains one. □ 
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